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TIMING PERFORMANCE ANALYSIS 



FIELD OF THE INVENTION 

[0001] The present invention relates generally to timing 
performance analysis / and more particularly to timing 
performance analysis for an integrated circuit comprising an 
embedded device, 

BACKGROUND OF THE INVENTION 

[0002] The process for producing an integrated circuit 
comprises many steps. Conventionally, a logic design is 
followed by a circuit design, which is followed by a layout 
design. With respect to the circuit design and layout 
portion, once circuits for an integrated circuit have been 
designed, such designs are converted to a physical 
representation known as a "circuit layout" or "layout." 
Layout is exceptionally important to developing a working 
design as it affects many aspects, including, but not limited 
to, signal noise, signal time delay, resistance, cell area, 
and parasitic effect, 

[0003] Once a circuit is designed and laid out, it is 
often simulated to ensure performance criteria are met, 
including, but not limited to, signal timing. This type of 
analysis is difficult at the outset, and is made more 
difficult by an embedded design. An embedded design or 
embedded circuit is conventionally designed separately from 
an integrated circuit in which it is embedded. Sometimes 
this embedded circuit is referred to an intellectual property 
(IP) core or embedded core. This is because the information 
to build and test such an embedded circuit is provided from 
one company to another. 

[0004] AnIP core may have a certain maximum timing 
performance for input and output. For example, a 
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microprocessor will have certain maximum timing performance 
for input and output of data and other information to a 
memory, or more particularly, a memory controller. In 
personal computer manufacture, operation of memory, or more 
particular memory modules, is specified for a bus "speed," 
such as 33 MHz, 66 MHz, and so on. Presently, the Rambus 
Signaling Level road map is for a memory to processor bus 
frequency of 1.2 GHz. However, processors presently operate 
at speeds in excess of 1.2 GHz, and thus processors must be 
slowed down for communicating with memory. Moreover, memory 
is speed graded, and conventionally slower memory costs less 
than faster memory. 

[0005] However, there is not de facto standard bus 
interface for an embedded microprocessor. Accordingly, glue 
or gasket logic and/or interconnects are used to couple an 
embedded microprocessor to a host device, such as a 
programmable logic device. Programmable logic devices exist 
as a well-known type of integrated circuits that may be 
programmed by a user to perform specified logic functions. 
There are different types of programmable logic devices, such 
as programmable logic arrays (PLAs) and complex programmable 
logic devices (CPLDs) . One type of programmable logic 
devices, called a field programmable gate array (FPGA) , is 
very popular because of a superior combination of capacity, 
flexibility and cost. 

[0006] Accordingly, it would be desirable and useful to 
provide method and apparatus for timing performance analysis 
for an embedded device. 

SUMMARY OF THE INVENTION 

[0007] An aspect of the present invention is a method for 
performing a timing analysis for a core device to be embedded 
in a host integrated circuit. Clock- to-output timing 
information is obtained for the core device. Setup and hold 
timing information and delay timing information is determined 
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for a portion of the host integrated circuit. The clock- to- 
output timing information, the setup and hold timing 
information and the delay timing information is associated 
with respective signals, and a path time delay for each of 
the respective signals is calculated. 

[0008] An aspect of the present invention is a method for 
performing a timing analysis for a core device in a host 
integrated circuit. Setup and hold timing information is 
obtained for the core device. Clock- to-output timing 
information and delay timing information is determined for a 
portion of the host integrated circuit. The clock- to-output 
timing information, the setup and hold timing information and 
the delay timing information is associated with respective 
signals, and a path time delay for each of the respective 
signals is calculated. 

[0009] An aspect of the present invention is a method for 
determining timing performance. Clock- to-output times for a 
processor core are obtained. Static timing analysis is used 
to determine timing data for a memory controller. Setup and 
hold times are obtained from the timing data for the memory 
controller. A programmatic representation of logic and 
interconnects for coupling the memory controller and the 
processor core is provided. The programmatic representation 
of logic and interconnects are simulated to obtain delay 
times. The delay times, the setup and hold times and the 
clock- to-output times are used as inputs to a spreadsheet, 
and path times are determined from the spreadsheet. 
[0010] An aspect of the present invention is a method for 
determining timing performance. Setup and hold times for a 
processor core are obtained. Static timing analysis is used 
to determine timing data for a memory controller. Clock- to- 
output times are obtained from the timing data for the memory 
controller. A programmatic representation of logic and 
interconnects for coupling the memory controller and the 
processor core is provided. The programmatic representation 
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of logic and interconnects is simulated to obtain delay 
times. The delay times, the setup and hold times and the 
clock- to-output times are provided as input to a spreadsheet, 
and path times are determined from the spreadsheet. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0011] So that the manner in which the above recited 
features, advantages and objects of the present invention are 
attained and can be understood in detail, a more particular 
description of the invention, briefly summarized above, may 
be had by reference to the embodiments thereof which are 
illustrated in the appended drawings, 

[0012] It is to be noted, however, that the appended 
drawings illustrate only typical embodiments of this 
invention and are therefore not to be considered limiting of 
its scope, for the present invention may admit to other 
equally effective embodiments. 

[0013] FIG. 1 is a block diagram of an exemplary portion 
of an embodiment of an integrated circuit in accordance with 
one or more aspects of the present invention. 
[0014] FIG. 2 is a timing diagram for the integrated 
circuit portion of FIG. 1. 

[0015] FIGS. 3 and 4 are flow diagrams of respective 
exemplary embodiment of timing performance analysis processes 
for output and input paths, respectively, for the integrated 
circuit of FIG. 1 in accordance with one or more aspects of 
the present invention. 

DETAILED DESCRIPTION OF THE DRAWINGS 

[0016] In the following description, niamerous specific 
details are set forth to provide a more thorough 
understanding of the present invention. However, it will be 
apparent to one of skill in the art that the present 
invention may be practiced without one or more of these 
specific details. In other instances, well-known features 
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have not been described in order to avoid obscuring the 
present invention. 

[0017] Referring to FIG. 1, there is shown a block diagram 
of an exemplary portion of an embodiment of an integrated 
circuit 100. Integrated circuit 100 comprises an embedded 
core 110, such as an embedded microprocessor core, on-chip 
memory controller 120 (OCM) , and gasket or glue logic ("G- 
logic") and interconnects 115A and 115B. Integrated circuit 
100 may be a programmable logic device, such as an FPGA. 
Accordingly, OCM 120 may be programmed from FPGA circuit 
fabric, or may be a dedicated memory controller circuit, or a 
combination thereof. Furthermore, FPGAs conventionally 
comprise memory and a memory controller, and thus such a 
memory controller may be used to form at least a portion of 
OCM 120. Integrated circuit 100 may be formed after a timing 
analysis in accordance with one or more aspects of the 
present invention. 

[0018] There are two signal paths to and from embedded 
core 110, namely input path 113 and output path 114. Clock 
signal 109 is provided to embedded core 110 and OCM 120. 
Each signal path 113, 114 represents provisioning of data, 
control and address information to and from embedded core 
110. Accordingly, each signal path 113, 114 represents more 
than one signal path. Notably, a maximum time allowed 
without down grading for speed is: 

Tpath = l/fmin (1) 

where fmin represents a minimum acceptable operating frequency 
for a system. Notably, f^in may be set equivalent to a 
maximum operating frequency for transferring information to 
and from embedded core 110. Thus, embedded core 110 is used 
to determine at least an initial value of f^n. Because there 
is more than one signal, each signal will have an associated 
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Tpath- However, though each Tpath delay may be the same for 
two or more signals it may also be different, depending on 
routing circuitry and the like. Thus, Tpath is evaluated for 
each signal to determine Tpath for a system. However, as the 
analysis for each signal is the same, only one input and one 
output signal is described in the exemplary timing diagram of 
FIG. 2 for purposes of clarity. 

[0019] Referring to FIG. 2, there is shown a timing 
diagram integrated circuit 100 of FIG. 1. With continuing 
reference to FIG. 2 and renewed reference to FIG. 1, input 
path 113 comprises delays that go into determining Tpath. OCM 
output signal 131 has a clock-to-output delay (C-O) 134. 
This means that from a first triggering edge 109-1 of clock 
signal 109 to OCM 120, a signal 131 to be outputted from OCM 
120 is delayed by an amount of a clock- to-output delay 134 
before it is outputted from OCM 120, as indicated by 
transition 131-1. Another delay in determining is that 
caused by routing OCM output signal 131 through any G-logic 
and interconnect 115-B present on input path 113 with respect 
to communicating such signal. Accordingly, G-logic and 
interconnect (GL&I) output signal 132 is delayed by GL&I 
delay 135 with respect to OCM output signal 131, as indicated 
by transition 132-1. Notably, OCM output signal 131 and an 
GL&I output signal 132 may be the same signal, in which case 
transition 131-1 is just transition 132-1 further delayed. 
[0020] Additionally, embedded core 110 comprises one or 
more setup and hold times. So, a setup and hold time for an 
incoming signal to embedded core 110 must be met before a 
next triggering edge 109-2 of clock signal 109. Embedded 
core input signal 133 is equivalent to GL&I output signal 
132, as indicated by each having transition 132-1. However, 
embedded core input signal 133 is used in FIG. 2 to clearly 
delineate setup and hold time (Setup Time) 136 as measured 
from transition 132-1 of core input signal 133 to triggering 
edge 109-2 of clock signal 109. 
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[0021] It should be understood that for this embodiment 
Tpath is to be less than one period of clock signal 109 to 
ensure integrated circuit 100 may operate at fmin of embedded 
core 110. In other words, it may be a goal to have clock 
signal 109 with a frequency of fmin. This is important 
because embedded core 110 may be provided as a "hard macro, " 
namely, a fixed layout formed with a set minimum lithographic 
feature size. In other words, if embedded core 110 may not 
be changed, then operating at an optimum frequency of 
embedded core 110 is a set target operating speed. Notably, 
though the embodiment described herein is a single data rate, 
the present invention may be used with double data rate 
timing . 

[0022] Output path 114 has delays similar to those of 
input path 113, Accordingly, embedded core 110 provides core 
output signal 121 delayed by a clock-to-output delay 124 as 
measured from a triggering edge 109-1 of clock signal 109 to 
transition 121-1 of output signal 121* Measured from 
transition 121-1 to transition 122-1 is GL&I delay 125 of 
GL&I output signal 122 due to G-logic and/or interconnect 
115-A. OCM input signal 123 is equivalent to GL&I output 
signal 122, as indicated by each having transition 122-1. 
However, OCM input signal 123 is used in FIG. 2 to clearly 
delineate setup time 126 as measured from transition 122-1 of 
OCM input signal 123 to triggering edge 109-2 of clock signal 
109. 

[0023] Conventionally, embedded core 110 is provided with 
performance data including setup and hold times and clock-to- 
output times. These times may be provided in a known format, 
such as Standard Delay Format (SDF) . Based on the assumption 
that setup and hold times and clock- to- output times are 
provided or determined, such as from simulation or testing 
prior to embedding, for embedded core 110, flow diagrams of 
FIGS. 3 and 4 are described. 

[0024] Referring to FIG. 3, there is shown a flow diagram 
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of an exemplary einbodiment of a timing performance analysis 
process 300 in accordance with one or more aspects of the 
present invention. With continuing reference to FIG. 4 and 
renewed reference to FIG. 1, timing performance analysis 
process 300 is described. Timing performance analysis 
process 300 is for output path 114. 

[0025] At step 301, clock-to-output times are obtained for 
embedded core 110. At step 302, a static timing analysis is 
done on OCM 120. This static timing analysis is done by 
simulation at a trans is tor- level, and may be done with a 
product called PathMill from Synopsis of Mountain View, 
California. At step 303, data from step 302 is used to 
determine respective setup and hold times for signals to be 
inputted to OCM 120. 

[0026] At step 304, a programmatic representation of 
gasket logic and interconnects 115-A is provided. Such a 
representation may be done in Verilog or VHDL, for example. 
At step 305, this programmatic representation is taken down 
from a logic level to something closer to a physical or 
transistor level, as such with HSpice or like program 
simulation, and simulated to get delays associated with 
signals passing through gasket logic and interconnects 115-A. 
[0027] At step 306, outputs from steps 303 and 305, 
namely, setup and hold times for OCM 120 and signal delays 
for gasket logic and interconnects 115-A, respectively, are 
associated with clock-to-output times for embedded core 110 
from step 301. This association may be done using a 
spreadsheet, a database and the like. For example, assuming 
data_out_l from embedded core 110 is under consideration, 
then a spreadsheet association may look something like that 
shown in Table I, 



Table I 



signal 


C-0 


GLScI Delay 


Setup /Hold 


Total Time 


DOl 


100 


50 


25 


175 



8 



X-1081 US 



PATENT 



where all values are expressed in units of time, such as 
picoseconds for example. 

[0028 J At step 307, critical paths are identified by 
totaling C-0 delay, GL&I delay and Setup/Hold time to provide 
a total time for each signal traveling along output path 114. 
Accordingly, a total time, Ti, is determined for each signal 
on output path 114 going from embedded core 110 to OCM 120. 
[0029] At step 308, T, is compared to Tpath- For example, 
it may be determined whether Tpath is greater than or equal to 
Ti for each signal. Notably, it should be appreciated that 
if Ti was equal to Tpath, then there would be "critical" 
timing. Accordingly, at step 308, such a check may be for 
Tpath greater than Ti to avoid critical timing. Moreover, to 
ensure a margin of error, T'path, which is approximately 1 to 
10 percent, for example, less than Tpath/ may be used at step 
308 for comparison with Ti. For purposes of clarity, the 
remainder of FIG. 3 is described as though Ti must be less 
than or equal to T^^, though it should be understood that 
other comparisons may be used. 

[0030] Alternatively, timing performance analysis process 
300 may end at step 307. This is because a largest value of 
times Ti may be determined, and frequency of operation of 
output path 114 of embedded core 110 may be set from there. 
[0031] However, assuming either or both OCM 120 and gasket 
logic and interconnects 115-A may be modified, if any T. is 
greater than Tpath, then at step 310 circuitry from either or 
both OCM 120 and gasket logic and interconnects 115-A is 
modified to reduce time associated with identified critical 
paths, namely, signal paths producing Ti's greater than Tpath- 
In response to modification of circuitry at step 310, layout 
for such modified circuitry is made at step 311 and circuitry 
values associated therewith including, but not limited to, 
resistance, capacitance, among others both actual and 
parasitic, are extracted. Modified circuitry and associated 



X-1081 US 



PATENT 



circuitry values are fed back at steps 302 and 304, as 
applicable. For example, if no change results in OCM 120 to 
modification to gasket logic and interconnect 115-A, then 
there is nothing to feedback, and vise versa with respect to 
change to OCM 120 resulting in no change to gasket logic and 
interconnect 115-A. Of course, modification may be made to 
both OCM 120 and gasket logic and interconnect 115-A 
resulting in feedback for both. 

[0032] Timing performance analysis process 300 may 
continue, until at step 308 each Ti is less than or equal to 
'^path' which event timing performance analysis process 300 
ends at step 309. Notably, timing performance analysis 
process 300 works with embedded core 110 formed with a 
lithography of a first minimum dimension and OCM 120 /gasket 
logic and interconnects 115-A form with a lithography of a 
second minimum dimension different than the first minimum 
dimension. So, for example, embedded core 110 may be formed 
using .13 micron lithography and OCM 120/gasket logic and 
interconnects 115-A may be formed using .18 micron 
lithography. 

[0033] Referring to FIG. 4, there is shown a flow diagram 
of an exemplary embodiment of a timing performance analysis 
process 400 in accordance with one or more aspects of the 
present invention. With continuing reference to FIG. 4 and 
renewed reference to FIG. 1, timing performance analysis 
process 400 is described. Timing performance analysis 
process 400 is for input path 113. 

[0034] At step 401, setup and hold times are obtained for 
embedded core 110. At step 402, a static timing analysis is 
done on OCM 120. This static timing analysis is done by 
simulation at a transistor-level, and may be done with a 
product called PathMill from Synopsis of Mountain View, 
California. At step 403, data from step 402 is used to 
determine respective clock- to-output times for signals to be 
output ted from OCM 12 0. 
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[0035] At Step 404, a programmatic representation of 
gasket logic and interconnects 115-B is provided. Such a 
representation may be done in Verilog or VHDL, for example. 
At step 405, this programmatic representation is taken down 
from a logic level to something closer to a physical or 
transistor level, as such with HSpice or like program 
simulation, and simulated to get delays associated with 
signals passing through gasket logic and interconnects 115-B. 
[0036] At step 406, outputs from steps 403 and 405, 
namely, clock- to-output times for OCM 120 and signal delays 
for gasket logic and interconnects 115-B, respectively, are 
associated with setup and hold times for embedded core 110 
from step 401. This association may be done using a 
spreadsheet, a database and the like. For example, assuming 
data_in_l to embedded core 110 is under consideration, then a 
spreadsheet association may look something like that shown in 
Table II, 



Table II 



Signal 


C-0 


GL&I Delay 


Setup/Hold 


Total Time 


DIl 


150 


50 


25 


225 



where all values are expressed in iinits of time, such as 
picoseconds for example. 

[0037] At step 407, critical paths are identified by 
totaling C-0 delay, GL&I delay and Setup/Hold time to provide 
a total time for each signal traveling along input path 113. 
Accordingly, a total time, Ti, is determined for each signal 
on input path 113 going from OCM 120 to embedded core 110. 
[0038] At step 408, Ti is compared to Tpath- For example, 
it may be determined whether Tpath is greater than or equal to 
Ti for each signal. Notably, it should be appreciated that 
if Ti was equal to Tpath, then there would be "critical" 
timing. Accordingly, at step 408, such a check may be for 
Tpath greater than Ti to avoid critical timing. Moreover, to 
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ensure a margin of error, T'path, which is approximately 1 to 
10 percent, for exart^le, less than Tpath, may be used at step 
408 for comparison with Ti. For purposes of clarity, the 
remainder of FIG. 4 is described as though Ti must be less 
than or equal to Tpath, though it should be understood that 
other comparisons may be used. 

[0039] Alternatively, timing performance analysis process 
400 may end at step 407. This is because a largest value of 
times Ti may be determined, and frequency of operation of 
input path 113 of embedded core 110 may be set from there. 
[0040] However, assuming either or both OCM 120 and gasket 
logic and interconnects 115-B may be modified, if any T. is 
greater than Tpath/ then at step 410 circuitry from either or 
both OCM 120 and gasket logic and interconnects 115-B is 
modified to reduce time associated with identified critical 
paths, namely, signal paths producing Ti's greater than Tpath. 
In response to modification of circuitry at step 410, layout 
for such modified circuitry is made at step 411 and circuitry 
values associated therewith including, but not limited to, 
resistance, capacitance, and inductance, among others both 
actual and parasitic, are extracted. Modified circuitry and 
associated circuitry values are fed back at steps 402 and 
404, as applicable. For example, if no change results in OCM 
120 to modification to gasket logic and interconnect 115-B, 
then there is nothing to feedback, and vise versa with 
respect to change to OCM 120 resulting in no change to gasket 
logic and interconnect 115-B. Of course, modification may be 
made to both OCM 120 and gasket logic and interconnect 115-B 
resulting in feedback for both. 

[0041] Timing performance analysis process 400 may 
continue, until at step 408 each Ti is less than or equal to 
Tpath, in which event timing performance analysis process 400 
ends at step 409. Notably, timing performance analysis 
process 400 works with embedded core 110 formed with a 
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lithography of a first minimum dimension and OCM 120/gasket 
logic and interconnects 115-B form with a lithography of a 
second minimum dimension different than the first minimum 
dimension. So, for example, embedded core 110 may be formed 
using .13 micron lithography and OCM 120/gasket logic and 
interconnects 115-B may be formed using .18 micron 
lithography. 

[0042] While foregoing is directed to the preferred 
embodiment of the present invention, other and further 
embodiments of the invention may be devised without departing 
from the basic scope thereof, and the scope thereof is 
determined by the claims that follow. For example, though 
the present invention is described in terms of an FPGA and 
embedded processor core, it should be understood that 
constructs other than an FPGA and an embedded processor core 
may be used, including, but not limited to, combinations 
formed of a programmable logic device and at least one of a 
memory, an Application Specific Integrated Circuit, an 
Application Specific Standard Product, a Digital Signal 
Processor, a microprocessor, a microcontroller, and the like. 
[0043] All trademarks are the respective property of their 
owners . 
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